COMBINED INVERSE FAST FOURIER TRANSFORM AND GUARD 
INTERVAL PROCESSING FOR EFFICIENT IMPLEMENTATION OF OFDM 

BASED SYSTEMS 

FIELD OF INVENTION 

[0001] The present invention relates generally to Inverse Fast Fourier Transform and 
Guard Interval processing in Orthogonal Frequency Division Multiplexing based 
transmission systems. In particular, combined processing allows a more efficient 
implementation of an OFDM system. 
BACKGROUND OF THE INVENTION 

[0002] Orthogonal Frequency Division Multiplexing ("OFDM") is a multi-carrier 
data transmission technique that is advantageously used in radio-frequency based 
transmitter-receiver systems. These systems may include, for example, computer 
WiFi (IEEE 802.11a) data systems. Other standards where OFDM based system is 
used are IEEE 802.16 for a Fixed Wireless Access (FWA), HiperLAN2, Digital 
Audio Broadcast (DAB), Digital Video Broadcast (DVB) and Digital Subscriber Line 
(DSL). 

[0003] OFDM systems typically divide available radio spectrum into many carriers. 
Each of the many carriers has a narrow bandwidth and is modulated with a low rate 
data stream. The carriers are closely spaced without causing inter-carrier interference 
(ICI) by ensuring that the carriers are orthogonal to each other. 

[0004] When generating an OFDM signal, each carrier is assigned a data stream. The 
data streams are converted into symbols depending on the modulation scheme. For 
each symbol to be transmitted, phase and amplitude are calculated in the frequency 
domain. The phase and amplitude are determined according to the modulation 
scheme, which may be, for example, Quadrature Amplitude Modulation (QAM), 
Quadrature Phase Shift Key (QPSK), or any other suitable modulation scheme. Once 
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the phase and amplitude are determined, they must be converted to time domain 
signals for transmission. Typically, OFDM systems use an Inverse Fast Fourier 
Transform (IFFT) to perform this conversion. The IFFT is an efficient way of 
mapping the data on to orthogonal carriers. The time domain signal is then up 
converted to the radio frequency of the appropriate carrier before transmission. 
[0005] One of the problems associated with radio transmissions is inter-symbol 
interference (ISI) due to multi-path delay interference. A reflected radio signal 
follows a longer path than a line of sight radio signal. The difference in delay caused 
by reflected signals may obscure the direct signal. OFDM systems are well-suited to 
cope with multi-path delay problems because the low data rate of each carrier 
generates long symbol periods. Tolerance to multi-path delay is enhanced by the 
addition of Guard Intervals (GI) separating the symbols. If the Guard Intervals are as 
long or longer than the differential delays expected from multi-path sources, then 
multi-path interference is effectively eliminated. To achieve a reasonable throughput, 
the OFDM symbol duration may be at least five times the Guard Interval. In order to 
avoid inter-carrier interference (ICI), the OFDM symbol is cyclically extended in the 
GI. 

[0006] Guard Interval insertion may be achieved as follows. An IFFT has an 
associated "length" corresponding to a number of coefficients for the transform. The 
Guard Interval is generated by outputting the last few IFFT output coefficients at the 
beginning of the symbol to form a cyclic prefix (CP). The size of the cyclic prefix 
varies for different applications. For example, for an IFFT with a length of 64, the 
output corresponding to the last 1 6 coefficients may be transmitted first as the cyclic 
prefix, and then the output corresponding to the 64 coefficients of the entire IFFT 
output in regular order. 
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[0007] For the addition of this CP in known systems, the IFFT output requires 
relatively large buffers. If the output of the IFFT is in bit-reversed time order, then 
two buffers of size N (N is length of IFFT) are required. If the output of the IFFT is in 
time order, then a single buffer of the length of the IFFT (N) is required. Buffers add 
expense to the system. 

[0008] What is required is an IFFT and Guard Interval insertion stage that achieves 
the required signal processing with less buffer requirements. 
SUMMARY OF THE INVENTION 

[0009] A circuit for converting frequency domain information to time domain 
information includes an Inverse Fast Fourier Transform circuit having a length of N 
coefficients. The Inverse Fast Fourier Transform circuit is adapted to receive input 
data of length N coefficients and generate output data of length N coefficients that are 
circularly shifted by m coefficients. The circuit also includes Cyclical Prefix 
Insertion circuit adapted to insert a cyclical prefix of length m. The Cyclical Prefix 
Insertion circuit includes a first switch, connected to the Inverse Fast Fourier 
Transform circuit, a buffer, having an input connected to the first switch and an 
output, the buffer having a length m, and a second switch, coupled to the first switch 
and to the buffer. The first and second switches selectively couple the output of the 
buffer and the Inverse Fast Fourier Transform circuit to an output of the second 
switch. The buffer is reduced to length m. 

[0010] The circular shift may be accomplished by multiplying the input data of the 
IFFT by a rotation factor, such as e -K 2 * p, * k * m > /N 5 where m is the amount of circular 
shift, k is the index of the IFFT input data and N is the length of the IFFT. For the 
particular case where N is a power of two and for a particular choice of a class of 
IFFT algorithms, this can be efficiently done by only modifying the control and data 
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flow for the IFFT. In an alternative embodiment, applicable only to the particular case 
just mentioned, circular shift may be achieved by reconfiguring rotator circuit, the 
addition and subtraction in butterfly circuits and re-arranging the order of the twiddle 
factors without modifying them. Twiddle factors are complex constant numbers with 
which the IFFT data in the intermediate stages are multiplied. 

[00111 In practice, an orthogonal frequency division multiplexed signal is generated 
by performing a circularly shifted Inverse Fast Fourier Transform on frequency 
domain information to generate circularly shifted time domain information, wherein 
the amount of the circular shift is the same as the length of the Guard Interval. A first 
group of samples of the shifted time domain information is passed to the circuit output 
and simultaneously stored in a buffer approximately the same size as the desired 
Guard Interval. When the Guard Interval buffer is full, the remainder of the samples 
of the Inverse Fast Fourier Transform are output, but not stored in the buffer. When 
the Inverse Fast Fourier Transform has completed outputting the shifted time domain 
information for a symbol, the buffer is then output. 

[0012] While well suited to OFDM systems having guard intervals, the present 
invention may also be applied to any IFFT or FFT system requiring a circular shift. 
Also, an IFFT designed according to the present invention may be reused as an FFT 
without eliminating the rotation. The rotation in the FFT may be removed in a 
subsequent stage. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0013] Figure 1 shows a block diagram of an IEEE 802.1 la transmitter. 

[0014] Figure 2 shows the addition of the cyclic prefix in the Guard Interval between 

OFDM symbols. 
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[0015] Figure 3 shows the buffering required for the addition of the Guard Interval, 

when the output of IFFT is in time order. 

[0016] Figure 4 shows the timing diagram for the IFFT block. 

[0017] Figure 5 shows the buffering required for the addition of the Guard Interval, 
when the output of the IFFT is in bit-reversed order. 

[0018] Figure 6 shows the Radix-2 multipath delay commutator pipeline for the FFT 
implementation. 

[0019] Figure 7 shows the Radix-2 single path delay feedback pipeline for the FFT 
implementation. 

[0020] Figure 8 shows the Radix-4 multipath delay commutator pipeline for the FFT 
implementation. 

[0021] Figure 9 shows the Radix-4 single path delay feedback pipeline for the FFT 
implementation. 

[0022] Figure 10 shows the Radix-2 2 single path delay feedback pipeline for the FFT 
implementation. 

[0023] Figure 1 1 shows the Radix-2 3 single path delay feedback pipeline for the FFT 
implementation. 

[0024] Fig 12 shows the general scheme for reducing the buffer requirement for GI 
insertion. 

[0025] Fig 13 shows the various components of an FFT/IFFT pipeline circuit. 
[0026] Figure 14 shows the buffering required for the addition of the Guard Interval, 
when the output of the 64-point IFFT is in time order but circularly shifted by 16 
samples. 

[0027] Figure 15 shows the flow-graph for a Radix-2 2 flowgraph for a 16-point IFFT. 
[0028] Figure 16 shows the R2SDF pipeline for the 16-point IFFT. 
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[0029] Figure 17 shows the butterfly equivalence when both inputs to a butterfly are 
multiplied by the same constant. 

[0030] Figure 18 illustrates the pulling of the complex common factor and the "j" 
factor to the multiplier and rotator. 

[0031] Figure 19 illustrates the effect of multiplying the lower input of a butterfly by 
-1. 

[0032] Figure 20 shows the modified IFFT flowgraph, which results in a circularly 
shifted output. 

[0033] Figure 21 shows the modified 64-point IFFT flowgraph with 16-points 
circularly shifted output. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 
A. Known Fourier Transform Architectures 

[0034] There are a number of known schemes for OFDM systems. These schemes 
employ different IFFT structures. 

[0035] Figure 1 shows the block diagram of a known IEEE 802.1 la transmitter 100, 
which is a typical example of an OFDM system. The input bit stream is processed in 
a Scrambler 101 by a pseudo random sequence to avoid a continuous stream of ones 
or zeros. A Convolution Encoder 102 is connected to the scrambler and is a forward 
error correction module. This is followed by an Interleaver 103, which reduces burst 
errors. Other OFDM based systems have similar blocks and may have an additional 
block encoder for forward error correction, like the Reed Solomon encoder in addition 
to the convolution encoder. The output of Constellation Mapper 104 is connected to 
the IFFT stage 105. The output of the IFFT stage 105 is connected to Guard Interval 
Stage 106. The output is then connected to the RF modulation stage (not shown). 
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[0036] Referring to Figure 2, the composition of the signal output by the Guard 
Interval Stage 106 for an IEEE 802.1 la system is illustrated. In this example, there is 
a Symbol 120 and a Guard Interval 122 (comparing a cyclical prefix). The length of 
IFFT is 64 and the CP or the length of the Guard Interval 122 is 16. The total length 
of one of the output transform comprises the CP (16) plus the Symbol (64), which 
requires 80 clock cycles. Other Guard Intervals and symbol lengths may be used in 
other systems. HIPERLAN 2, for example, has a length of the IFFT and GI, which is 
the same as IEEE 802.1 la but also supports an optional CP of length 8. 
[0037] Referring to Figure 3, conventional Guard Interval insertion stage 106 is 
illustrated. The output of the IFFT stage is connected to Switch 126. Switch 126 
selectively routes the output of the IFFT stage to a buffer 128 and to a second Switch 
130. If the output of the IFFT is in normal order, a buffer of 64-samples is required. 
The output of the IFFT is routed into the buffer of length 64. The last 16 samples are 
routed both to the buffer and the output. Hence, sample numbers 48 to 63 will go out 
first. After 64 clocks the buffer will be full. The IFFT output will stop for 16 cycles. 
The sample number 0 to 63 will then be read out of the buffer. After 16 samples the 
output of the next IFFT will start coming and the process continues. So the output of 
the IFFT is discontinuous. Only 64 samples are output in the duration of 80 samples. 
But the output after GI addition is continuous. 

[0038] Figure 4 is an illustration of the timing of a conventional IFFT module. 
Before Guard Interval insertion, samples are generated only for the Symbol 120. 
After Guard Interval insertion, samples are continuous, comprising Symbol 120 and 
Guard Interval 122. 

[0039] An alternative Guard Interval insertion stage 106a is illustrated in Fig. 5. An 
input of Switch 132 is connected to the IFFT stage. Switch 132 selectively connects 
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the output of the IFFT to Buffers 134 and 136. The outputs of Buffers 134 and 136 
are connected to Switch 138. Switch 138 selectively connects the output of the 
Buffers 134 and 136 to the RF stage (not shown). If the output of the IFFT is in bit 
reversed order then two buffers of size 64 are required. The IFFT output is written 
into buffer 134 for 64 clocks. It is then read out of the buffer in the required order. 
After 64 clocks there is no output of the IFFT for 16 clocks. After 16 clocks the 
output of the next IFFT is written into buffer 136 while data is read out from the 
previous buffer 134. This process continues and the IFFT output is written alternately 
in the two buffers. 

[0040] Where the transmission signal is converted from frequency domain to time 
domain by an IFFT, the receiver uses a corresponding Fast Fourier Transform (FFT) 
to convert received time domain signals back into frequency domain information. 
The FFT and the IFFT are inverse operations of each other but are very similar in 
their implementation. Any method for implementing the FFT is generally applicable 
to the implementation of the IFFT with suitable modifications. Various architectures 
have been proposed for the implementation of the FFT in hardware. A very good 
overview of some of the pipeline architectures is given by, Shousheng He and Mats 
Torkelson, in, A New Approach To Pipeline FFT Processor, International Parallel and 
Distributed Processing Symposium, 1996. 

[0041] One example of a FFT architecture is the Radix-2 Multi-path Delay 
Commutator (R2MDC) shown in Figure 6. The basic structural elements comprise 
commutators 140, butterfly units 142, multipliers 144, and a rotator 146. It shows a 
simple implementation for any radix-2 based 16-point FFT flowgraph. The utilization 
of the butterflies and multipliers is only fifty percent. The memory utilization is also 
not efficient in this implementation. 
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[00421 A Radix-2 Single Path Delay Feedback (R2SDF) is shown in Figure 7. This 
pipeline architecture was proposed by, E. H. Wold and A. M. Despain, in Pipeline and 
Parallel Pipeline FFT Processors for VLSI Implementations, IEEE Transactions on 
computers, C-35(5):414-426, May 1984. It makes hundred percent utilization of 
memory. The efficiency of the butterfly and multiplier stages is the same as that of 
R2MDC. 

[0043] Figure 8 shows a Radix-4 Multi Path Delay Commutator (R2MDC) pipeline 
architecture for a 256-point FFT. The advantage of using a radix-4 algorithm is that 
the number of multipliers required is less as compared to a radix-2 algorithm for the 
same length of FFT. The disadvantage is that radix-4 butterfly units are more 
complex as compared to radix 2 butterfly units. 

[0044] Figure 9 shows a Radix-4 Single Path Delay Feedback (R4SDF) pipeline 
architecture for a 256-point FFT. This is the radix-4 equivalent of R2SDF pipeline. It 
has the advantage of having a hundred percent memory utilization, which is the 
characteristic of Single Delay Feedback (SDF) pipeline. It has the advantage of 
reduced number of multipliers as compared to a radix-2 algorithm based pipeline. It 
also has the disadvantage of the more complex butterfly structure for radix-4 
algorithm based pipeline architecture. 

[0045] US Patent 6,098,088 proposes a radix-2 2 algorithm in Decimation in 
Frequency (DIF) form, which has a reduced multiplicative complexity of a radix-4 
algorithm, and also the simpler butterfly unit of a radix-2 algorithm. In doing so, the 
best features of both algorithms are combined. But a similar flowgraph has been 
shown by A. M. Despain in Very Fast Fourier Transform algorithms Hardware for 
Implementation, IEEE Transactions on Computers, C-28(5):333-341, May 1979. In 
this architecture, the multiplication is achieved by using CORDIC operations. A 
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similar flow graph for Decimation in Time (DIT) form, has also been derived by R. 
Storn in Radix-2 FFT Pipeline Architecture with reduced noise to signal ratio, in IEE 
Proc. -Vis. Image Signal Process. 141(2):81-86 April 1994. 

[0046] US Patent 6,098,088 discloses a derivation of a Radix-2 2 algorithm in DIF 
form. But the DBF form of the flowgraph can be obtained from the DIT form of the 
flowgraph shown by R. Storn by re-arranging the input (time) coefficients in order 
and the output (frequency) coefficients in bit-reversed order. This flowgraph has a 
special property in which for any length of FFT, which is a power of 2, a complex 
multiplier is required only in alternate stages of the flowgraph. In all the other stages 
only a simple rotator (multiplication by factor "j") is required. Sousheng He makes 
use of this property to slightly modify the R2SDF pipeline. If the conventional radix- 
2 algorithm is used, then a rotator 146 is required only in between the last two 
butterfly units 142. In between all other butterfly units 142 a complex multiplier 144 
is required as shown in Fig. 7. But if a radix-2 2 algorithm is used, alternate rotators 
and multipliers will be required. The rotators can be pulled inside the following 
butterfly unit which results in a slightly different butterfly unit which has an 
additional switching logic to exchange the real and imaginary parts (caused as a result 
of multiplication by "j"). So this results in a new pipeline structure with a pair of 
butterfly units (one a simple radix-2 butterfly BF 21 150 and the other a radix-2 
butterfly with the switching logic BF 211 152. Two such pairs of butterfly units 
connected by a complex multiplier unit 144 form a 16-point FFT circuit. This is 
shown in Fig. 10 and this invention has been named as Radix 2 2 SDF (R2 2 SDF) 
pipeline. 

[0047] US Patent 6,061,705 extends the concept to higher radix to obtain Radix-2 3 
and Radix-2 4 and applies it to the SDF pipeline. The resulting pipeline for a 64-point 
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FFT is shown in Fig. 11. As can be seen it requires only one complex multiplier in 
the pipeline. The radix-2 2 algorithm would require two complex multipliers for a 64- 
point FFT. The radix-2 3 butterfly unit 154 itself consists of three simple radix-2 
butterfly units, two simple rotators and one fixed coefficient multiplier. This has been 
obtained by extending He's idea to decompose the radix-8 butterflies. A main point 
of this patent is that, the fixed coefficient multipliers consume much less area and 
power for a higher length FFT as compared to radix-2, radix-4 or even radix-2 2 
algorithms when applied to the SDF pipeline architecture. Similarly, the radix-2 4 
butterfly unit has four simple radix-2 butterfly units, three rotators and two, fixed 
coefficient multipliers. Hellberg also states that extending the concept to higher radix 
beyond sixteen does not necessarily give power and area advantage. 
[0048] In order to implement the IFFT, the flowgraph for the FFT needs to be 
traversed in the reverse direction (i.e., from right to left) and all the twiddle factor 
coefficients need to be complex conjugated. The pipeline also accordingly gets 
reversed. So all the pipeline architectures described for the FFT are applicable to the 
IFFT also. 

B . Application Of The Invention. 

[0049] The present invention uses a scheme in which the output of the IFFT is in 
normal order but circularly shifted by the amount of CP, in order to reduce the 
buffering requirement for the addition of the CP in the GI. The scheme is applicable 
to any size (N) of IFFT (and any length (1 to N-l) of cyclic prefix. The invention 
may be applied to any IFFT algorithm and any pipeline architecture. According to the 
scheme the output (time) of IFFT is in normal order, but is circularly shifted by the 
amount of the CP. This reduces the buffering requirement for the addition of CP in the 
GI. The required buffer size is equal to the length of the CP. An OFDM System 200 
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using this scheme is illustrated in Fig. 12. A multiplier 202 multiplies the input data 
204 by rotator coefficients 206 to get an IFFT output, which is circularly shifted. A 
Switch 210 selectively switches IFFT 208 output to Buffer 212 and Switch 214. 
[0050] In one particular case, when the length of IFFT, N, is a power of 2 and a radix 
2 or radix 2 2 algorithm is used, then this invention proposes a scheme by which there 
is no additional hardware complexity (like multiplier) added to implement the 
circularly shifted output. With reference to Fig. 12, the IFFT 208 may be modified 
(without increasing complexity), and Rotator Coefficient 206, and Multiplier 202 may 
be eliminated in order to obtain a circularly shifted output. An example illustrated 
here uses the SDF pipeline and the algorithm chosen is Radix-2 2 as this results in a 
very efficient implementation. It is shown that when this scheme is applied to the 
SDF pipeline using the radix-2 2 (or even the conventional radix-2 algorithm), it does 
not increase the complexity of the pipeline architecture. The invention is not limited 
to implementation in this pipeline architecture so long as the underlying algorithm is 
radix 2 or radix 2 2 . For this special case it is required that the input to EFFT is in bit 
reversed order. 

[0051] Two exemplary methods are shown to obtain any circular shift from (1 to N-l) 
in an N (power of 2) point EFFT. Fig. 13 shows the structure of an IFFT pipeline 
implementation. The IFFT pipeline 220 includes data path 222, which includes 
butterfly circuits 224, multiplier circuits 226 and rotator circuits 228. It has a memory 
230, which only stores the twiddle factors for the Multiplier circuit. It also has a 
control circuit 232, which controls the butterfly circuits 224, rotator circuits 228 and 
multiplier circuits 226. In one method, the twiddle factors contained in the memory 
230 connected to the multipliers 226 and the control 234 for the rotators needs to be 
changed to obtain the appropriate circular shift. This change can be restricted to the 
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first rotator and first multiplier of the pipeline. In another method, the twiddle factor 
coefficients are not changed, but reordered within the memory 230, and the control 
236 of the butterfly and control 234 of rotator circuits of the pipeline is changed. 
Depending on the amount of circular shift required, the change in the control circuit 
can range through all the stages of the pipeline. But both methods do not add any 
extra hardware block like multiplier in the pipeline. 

[0052] In the present invention, the following circular shifting property of the FFT is 
made use of to reduce the buffering requirement for GI addition 

x[((n-m)) N ] ^ FFT ^ W^^XfkJ (1) 

[0053] The double brackets indicates a modulo N operation, which results in a 

circular shift. Here, x[n] is the time domain sample and X[k] is the corresponding 
frequency domain data and 

fl/w/N = e -j(2*Pi*k*m)/N p) 

[0054] For the case of IEEE 802.11a, selecting N = 64 and m = 16 gives a circular 

— j(Pi*kV2 

shift of sixteen samples. So each sample X[k] should be multiplied by e 
Hence, if the input samples of the IFFT, X[k] is multiplied by e ~j (Pl * k)/2 5 the output 
will get circularly shifted by sixteen samples. If the IFFT output is in time order then 
the sequence of samples will be as follows, 

x[48], x[49], x[50], x[62], x[63], x[0] f x[I], x[2], , x[46],x[47]. 

[0055] In this case only a buffer of sixteen samples is needed as illustrated in Figure 
14. The first sixteen samples (x[48] to x[63]) are fed into the buffer 212 of size 
sixteen and also routed to the output. After the first sixteen samples, the next forty- 
eight samples (x[0] to x[47]) are routed directly to the output and not to the buffer. 
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After sixty-four clocks the IFFT output stops for sixteen clocks. During this period the 

contents of the buffer (x[48] to x[63]) are read out. This whole process is then 

repeated for each IFFT. It is thus seen that only a buffer of size sixteen is needed. 

Although this scheme can be applied to any IFFT algorithm in any pipeline 

architecture, it is preferred to use the R2SDF pipeline with a radix-2 or radix-2 2 

algorithm. Using this algorithm does not increase the complexity of the pipeline even 

-i(Pi*k)/2 

if each input sample X[k] needs to be multiplied by e 

[0056] The IFFT can be implemented using any FFT flowgraph by traversing it in the 
reverse direction (from right to left) and conjugating all the twiddle factor 
coefficients. The radix-2 2 flowgraph for a 16-point IFFT is shown in Figure 15. This 
particular flowgraph is chosen, as it is the preferred algorithm to be used since it has 
reduced multiplier complexity of a radix-4 flowgraph while retaining the simplicity of 
a radix-2 butterfly unit. The invention is also applicable to a radix-2 algorithm. The 
16-point BFFT is also chosen for ease of presentation. The idea is applicable to any 
length of IFFT, which is a power of two. 

[0057] In the flowgraph the dotted lines indicate a subtraction and the solid lines 
indicate an addition operation. Also W k = e' j(2 * pi * k)/16 . It is to be noted that for the 
IFFT, the input (frequency) is in bit-reversed order and output (time) is in normal 
order. This is the preferred order, as it requires a single buffer for GI addition. The 
R2SDF pipeline for the 16-point IFFT is shown in Figure 16. It is also seen that a 
multiplier 226 is needed only after the second butterfly 224 stage. After the first 224 
and third 224 butterfly stages, the multiplication is only by a factor of "j". This is 
done by first negating the imaginary part and then exchanging the real and imaginary 
parts. This does not require a multiplier and is done through phase rotators 228. The 
R2SDF pipeline requires only one multiplier, two rotators and a total buffer of 15. 
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[0058] Referring to Figure 1 5, it can be seen that any set of four consecutive inputs to 
an N point IFFT would be X(K), X(K+N/2), X(K+N/4) and X(K+3N/4). For the 16- 
point IFFT, N equals 16 and K takes on values of 0, 2, 1 and 3 for each consecutive 
set of four samples. To obtain a circular shift of m samples in the output, each input 
coefficient X(K) has to be multiplied by e -J< 2W ^ m >' N . The required multiplication 
constants are shown in the table below. 



Sample 


Multiplication Constant 


Multiplication Constant (Akm + jB k m) 


m = 1 


m = 2 


m = 3 


m =4 i 


X(K) 


e H(2*PiVmVN =Akm + j Bkm 


(Aki + jB k i) 


r(Ak2 + jBk2) 


r(Ak3 + jB k 3) 


1*(Ak4 + jBk 4 ) 


X(K+N/2) 


e -j[2W(k+N/2)*m]/N = e -j(2*Prk*m)/N <e -j(Pi'm) 


-r(Aki+jB k i) 


1*(Ak2 + jBk2) 


-r(Ak 3 + iB k3 ) 


1*(Ak4 + iB k 4) 


X(K+N/4) 


e -j(2*Pi*(k+N/4)*m]/N = @ -j(2*Pi*k*m)/N >e -j(Pi*m)/2 


-j*(A k1 + jB k i) 


-1*(A k2 + jBk2) 


i*(Ak 3 + jB k3 ) 


1*(A k 4 + iB k4 ) 


X(K+3N/4) 


e -j[2*Pi*(k+3N/4)*m]/N = e -j(2*Pi*k*m)/N <e -j(3*Pi*m)/2 


J*(Aki+jB k i) 


-1*(Ak2 + jBk2) 


-J*(Ak3 + jBk 3 ) 


1*(A k 4 + jBk4) 



[0059] It can be seen that for a generic case (any k, m and N that is a power of 2) the 
constant, e" ( 2 * Pl * k * m ) /N 9 w m be a complex number equal to (Akm + jBkm). But 
because of the bit reversed ordering, this complex number forms a constant factor in 
the set of four multiplication constants for the four consecutive inputs. The additional 
terms are 1, e ' j(Pi * m) , e -i< pi * m > /2 and e -* 3 * pi * m > /2 for the four consecutive inputs. 
The last four columns of table gives the values of these additional terms for special 
case of m = 1, 2, 3 and 4. For higher values of m, the values of the additional 
constants will repeat cyclically. The value of the additional constant for m=l is same 
as that for m= 5 and the value for m=2 is the same as that for m=6 and so on. But the 
value of the complex common factor (Akm + jBkm), itself will vary with m. If both the 
inputs to a butterfly operation are multiplied by the same constant then the constant 
can be multiplied after the butterfly operation. This butterfly equivalence is shown in 
Figure 17. Since this factor is common across four inputs, the constant can be pulled 
across two butterfly operations. After the second butterfly stage there is a multiplier 
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in the pipeline. So the complex common factor of the constant required for circular 
shift can be merged with the twiddle factor constants in the multiplier after the second 
butterfly. For the m=l and m=3 cases, there is an additional factor "j" associated with 
the third and fourth constants X(K+N/4) and X(K+3N/4). This factor "j" can be 
pulled to the rotator present after the first butterfly stage. 

[0060] This leaves a factor of-1 in some cases. In the butterfly, if the lower input is 
multiplied by -1, it only means that the output of addition and subtraction are 
interchanged. In the actual implementation, in the pipeline this requires the output of 
the subtractor operation to be sent out first followed by the output of the adder 
operation, which is just the reverse of the normal case. So this also involves just a 
small modification of the control logic. Thus, it is seen that only the twiddle factor 
coefficients are modified to get a desired circular shift, but the complexity of the 
pipeline is not increased. 

[0061] The pulling of the factor "j", and the complex common factor "Akm+jBkm", for 
the m=l case, across the butterfly unit, is illustrated in Figure 18. It is to be noted that 
the factor -j is pulled so that a factor of-1 is retained for the lower input of the lower 
butterfly. The effect of this -1 factor is illustrated in Figure 19. 

[0062] For a 16-point IFFT (N = 16), if a circular shift of N/4 (m = 4) is required, 
then the additional terms, e ~ j(Pi * m) , e - j(Pi * m)/2 and e -* 3 * pi * m > /2 , all take a value of 
unity. The complex common factor, Akm + jBkm , itself takes values of 1, -1, -j and j 
for values of k equal to 0, 2, 1 and 3 respectively. So in order to achieve a circular 
shift of four, in the 16-point IFFT flowgraph of Figure 15, the first four coefficients 
should be multiplied by unity, the next four by -1, the next four by -j and the final 
four by j. Since there are no complex constants, the factors of -1, j and -j need to be 
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pulled only till the rotator after the first butterfly. The modified flowgraph with a 
circular shift of four is shown in Figure 20. 

[0063] For the special case of IEEE 802.11a, the size of IFFT is sixty-four (N = 64) 
and the circular shift required is sixteen (m = N/4 = 16). So it reduces to the case just 
considered for the 16-point IFFT with a circular shift of four. For the IEEE 802.1 la 
case, the first sixteen samples are multiplied by unity, the next sixteen by -1, the next 
sixteen by -j and the last sixteen by j. These factors can be pulled in to merge with 
the first rotator coefficients. The modified flowgraph for a 64-point IFFT, with a 
circular shift of 16 is shown in Figure 21 . 

[0064] An alternative implementation is now described, to obtain any circular shift (1 
to N-l) in an N point IFFT. This method is also applicable for a radix-2 2 (or radix-2) 
algorithm. The implementation is described here, using a SDF pipeline, with respect 
to the 16-point IFFT, using the radix-2 2 algorithm as shown in Figure 15. 
[0065] In the normal implementation, for the SDF pipeline, in the butterfly unit, 
always the result of the addition operation is sent out first and the result of the 
subtraction operation is sent later. In the butterfly unit an addition and a subtraction 
operations are done in a single cycle. The result of the additions is sent out first and 
always the result of the subtraction are stored in the delay feedback and sent out later 
depending on the length of feedback buffer. In Figure 16 the first butterfly unit (224) 
sends out the result of addition and subtraction in alternate cycles as it has a single 
buffer in the delay feedback. The second butterfly unit (224) sends out two addition 
results followed by two subtraction results as it has a buffer of size two in the delay 
feedback. The third butterfly unit (224) sends out four addition results followed by 
four subtraction results and so on. 
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[0066] In the proposed alternative method, this control is modified to obtain the 
required circular shift without modifying the twiddle factors. Depending on the 
amount of circular shift required, in some stages, the result of the subtraction is sent 
out first and the result of the addition is stored in the delay buffer and sent out later. 
The following table indicates the required modification to obtain any circular shift 
(from 1 to N-l) for the radix-2 2 16-point IFFT algorithm in a SDF pipeline shown in 
Figure 16. 



Circular 
Shift 


Stage 1 


Stage 2 


Stage 3 


Stage 4 


Oor 16 


(+)x8 


(++)x4 


(++++) x2 


(1 1 1 1 1 1 1 1) x 1 


1 


(-)x8 


(-+)x4 


(- +++) x 2 


(- mill) x 1 


2 


(+)x8 


(--)x4 


(--++) x 2 


(. . + 11 1 1 1) x 1 


3 


(-)x8 


(+-)x4 


(---+) x 2 


(. _ _ Mill) X 1 


4 


(+)x8 


(++) x 4 


(----)x2 


( ++++) x l 


5 


(-)x8 


(-+)x4 


(+- - -) x 2 


( +++) X 1 


6 


(+)x8 


(--)x4 


(++--) x 2 


( - ++) X 1 


7 


(-)x8 


(+-)x4 


(+++ -)x2 


( +) X 1 


8 


(+)x8 


(++)x4 


(++++) x 2 


( )xl 


9 


(-)x8 


(-+)x4 


(- +++) x 2 


(+ ) X 1 


10 


(+)x8 


(- -)x4 


(--++) x 2 


(++ ) x 1 


11 


(-)x8 


(+ -)x4 


(---+) x 2 


(+-H- ) X 1 1 


12 


(+)x8 


(++)x4 


(----)x2 


(III! ) X 1 ! 


13 


(-)x8 


(-+)x4 


(+- - -) x 2 


(Mill - - -) X 1 


14 


(+)x8 


(--)x4 


(++- -) x 2 


(MINI - -) X 1 ] 


15 


(-)x8 


(+-)x4 


(+++-) x 2 


(1 1 1 1 1 1 I-) X 1 
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[0067] In the table, the first column indicates the amount of circular shift. The 
remaining columns show the control required for each of the butterfly stages. For a 
16-point EFFT, the butterfly unit in every stage performs eight operations (i.e. eight 
additions and eight subtractions). A "+" indicates that the result of the addition is sent 
out to the next stage and result of the subtraction is sent to the feedback buffer. This 
is like the normal operation. But a "-"indicates the exact reverse where the result of 
the subtraction is sent out first and the result of the addition is sent to the feedback 
buffer. The "x 8" for stage 1 indicates that the pattern in the brackets is repeated 8 
times. This is used just to highlight the regular pattern in the control of the butterfly 
units. 

[0068] The first row indicates the control for a circular shift of zero or sixteen, which 
is the regular case of no circular shift. In this case all the butterfly units always send 
out the result of the addition first. Hence, the control for all the stages has only the 
sign. 

[0069] Consider the second row, which shows the controls, required to obtain a 
circular shift of one. For the first stage, in all the eight butterfly operations the result 
of the subtraction should be sent out first. For the second stage, in the first operation 
the result of the subtraction should be sent first and for the second operation the result 
of the addition should be sent first. This pattern should be repeated four times to 
complete the eight operations. For the third stage, in the first operation the result of 
the subtraction is sent out first and in the next three operations the result of the 
addition is sent out. This pattern is repeated twice to complete the eight operations. 
For the last stage, only in the first operation the result of the subtraction is sent out 
first. In the remaining seven operations the result of the addition is sent out first. 
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[0070] It can be seen that there is a very regular pattern in the control required for 
getting the different amount of circular shifts. The control required for any higher 
length N which is a power of 2, can easily be obtained by extending the table to 
follow the regular pattern. For the special case of IEEE 802.11a, N=64 and the 
circular shift required is 16. For this special case the following table gives the 
required control for all the stages of a radix 2/radix 2 2 pipeline. 



Stage 


Control 


Stage 1 


(+)x32 


Stage 2 


(++) x 16 


Stage 3 


(++++) x 8 


Stage 4 


(++++ ++++) x 4 


Stage 5 


(- 


--)x2 


Stage 6 


(- - 


- - ++++ ++++ ++++ ++++ )xl 



[0071] This invention describes a scheme for implementing an IFFT in the transmitter 
of OFDM based systems. The output of the IFFT is circularly shifted to minimize 
buffer requirement for GI addition. If the same pipeline is reversed and the twiddle 
factors are conjugated then it would implement an FFT for the receiver. This is done 
to reuse the hardware blocks of the transmitter in the receiver. Since the original 
pipeline for IFFT was designed for a circularly shifted output, the reversed pipeline 
for FFT would give the frequency domain coefficients rotated by W km/N as indicated 
by equation 1 . This does not cause a problem in OFDM based receivers as this factor 
would be eliminated after channel equalization. 

[0072] This invention describes a scheme for circularly shifting the output of IFFT, 
with reference to GI insertion in OFDM based transmission systems. But the 
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invention described here is not limited to OFDM systems alone, but can be used to 
obtain circular shift in FFT/IFFT used in any system. 
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